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ABSTRACT 

Tools used in scaling proficiency scores from the 
Second International Assessment of Educational Progress (IAEP) are 
described. The second IAEP study, conducted in 1991, was an 
international comparative study of the mathematics and science skills 
of samples of 9- and 13-year-old students from 20 countries. This 
paper focuses on part of the second stage of data analysis, work done 
in creating a unique scale for all the participating populations, 
that is, creating reference populations, scaling methodology, and 
linkage of 9- and 13-year-old population*. All populations 
contributed to four combined reference populations, superpopulations , 
one for each age group by mathematics and science combination. The 
item response theory scaling model is the three-parameter logistic 
model. Linking was accomplished through a small set of common items 
included in the 9- and 13-year-old assessments in each subject area. 
Results show that even if on an item-by-item basis the equating gives 
results that are not ideal, when all items are taken into account, 
student proficiency scores estimated with both sets of item parameter 
estimates can be considered to be on the same scale. Results further 
indicate that no information has been lost as a result of using one 
set of item parameter estimates over another. Fourteen tables and 16 
figures present analysis data. (SLD) 
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Introduction 



This paper describes the tools used in scaling proficiency scores from the second 
International Assessment of Educational Progress (IAEP) 1 . The reader already acquainted with the 
IAEP technical report volume two will find many similarities bewteen this paper and the technical 
report. It's normal, both are authored by the the same person and describe the same work. 
Moreover, the reader acquainted willi NAEP technical report 17-R-20 will find that the same 
technical tools were used in NAEP and IAEP. The important difference is that these tools were 
used with much more heterogeneous populations than in any study done before: up to 36 different 
populations assessed in 13 different languages. This heterogeneity rendered the attempt to put 
proficiency scores on the same scale a delicate task wich must be undertaken carefully. 

Since this paper inspired itself largely from the IAEP technical report volume two and that 
this report was written in collaboration, I would like to thank E.G. Johnson, R.J. Mislevy, P.J. 
Pashley, K.M. Sheehan and R.J. Zwick, all from Educational Testing Jervice, whose 
collaboration and experience were very precious. I would also like to thank Nancy Mead, Archie 
Lapointe and Jan Askew for their description of the IAEP populations that comes in the next 
section. 



The Second International Assessment of Educational Progress 

The second IAEP study, conducted in 1991, was an international comparative study of the 
mathematics and science skills of samples of 9- and 13-year-olds students from 20 countries. IAEP 
was designed to collect and report data on what students know and can do, on the educational and 
cultural factors associated with achievement, and on students' attitudes, backgrounds, and 
classroom experiences. 



The second International Assessment of Educational Progress was supported financially by the National 
Science Foundation and the U.S. Department of Education's National Center for Education Statistics for the expenses 
of overall coordination, sampling, data analysis, and reporting. The Carnegie Corporation provided additional funds 
to cover the travel expenses of some of the participants who could not meet the financial burdens of traveling to the 
project's coordination and training meetings held in Canada, England, France, Hong Kong, and the United States. 
Decisions concerning the design and implementation of the project were made in collaboration with the 
representatives of the countries involved in the survey. The National Academy of Sciences' Board on International 
Comparative Studies in Education reviewed plans for IAEP at several stages of its development and made 
suggestions to improve the technical quality of the study. The board is responsible for reviewing the soundness of 
the technical procedures of international studies funded by federal agencies of the U.S. government. 
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This project was a four part survey: a main assessment of 13-year-olds ! performance in 
mathematics and science; an assessment of 9-year-olds 1 performance in mathematics and science; 
an experimental, performance-based assessment of 13-year-olds 1 ability to use equipment and 
materials to solve mathematics and science problems; and a short probe of the geography skills and 
knowledge of 13-year-olds. All countries participated in the main assessment of 13-year-olds; 
participation in the other assessment components was optional. 

Some countries drew samples from virtually all children in the appropriate age group; 
others confined their assessments to specific geographic areas, language groups or grade levels. 
The definition of populations often followed the structure of school systems, political divisions, 
and cultural distinctions. For example, the sample in Israel focused on students in Hebrew- 
speaking schools, wich share a common curriculum, language and tradition. All countries limited 
their assessment to students that were in school, wich for some participants meant excluding 
significant numbers of age-eligible children. In a few cases, a sizable proportion of the selected 
schools or students did not participate in the assessment, and therefore results are subject to 
possible nonresponse bias. 

A list of the participants is provided below with a description of limitations of the 
populations assessed. Unless noted, 90 percent or more of the age-eligible children in a population 
are in school. For countries where more than 10 }>ercent of the age-eligible children are out of 
school a notation of in-school population appears after the country's name. In Brazil, two separate 
samples were drawn, one each for the cities of Sao Piuilo and Fortaleza. In Canada, nine out of the 
10 provinces drew separate samples of 13-year-olds and four of these drew separate samples of 
English speaking and French-speaking schools, for a total of 14 separate samples. Four Canadian 
provinces — six separate samples — participated in the assessment of 9-year-olds. 2 These distinct 
Canadian samples coincide with the separate provincial education systems in Canada and reflect 
their concern for the two language groups they serve. The IAEP project was asked to provide 
separate results for the American state of Colorado, which opted to assess its 9- and 13-year-olds 
students in mathematics, science, and geography. 



2 Taken together, the Canadian samples represent 94 percent of the 13-year-olds and 74 percent of the 9-year-olds in 
Canada. An appropriately weightd subsample of responses was drawn from these samples for the calculation of the 

statistics fnr C*nnnAn 



statistics for Canada. 
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Participants 



Brazil Cities of Sao Paulo and Fortaleza, restricted grades, in-school population 

Canada Four provinces at age 9 and nine out of 10 provinces at age 13 

China 20 out of 29 provinces and independant cities, restricted grades, in-schooi 
population 

England All students, low participation at ages 9 and 13 

France All students 

Hungary All students 

Ireland All students 

Israel Hebrew-speaking schools 

Jordan All students 

Korea All students 

Mozambique Cities of Maputo and Beira, in-school population, low participation 

Portugal Restricted grades, in-school population at age 13 

Scotland All students, low participation at age 9 

Slovenia All students 

Soviet Union 14 out of 15 republics, Russian-speaking schools 

Switzerland 15 out of 26 cantons 

Taiwan All students 

United States All students 



Each participating country was responsible for carrying out all aspects of the project, 
including sampling, survey administration, quality control, and data entry using standardized 
procedures that were developed for the project. Several training manuals were developed for the 
IAEP project. These comprehensive documents, discussed with participants during several 
international training sessions, explained in detail each step of the assessment process. 
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Typically, a representative sample of 3300 students from 1 10 different schools was selected 
from each population at each age level and half were assessed in mathematics and half in science. A 
total of about 175,000 9 and 13-year-olds (those born in calendar years 1981 and 1977, 
respectively) were tested in 13 different languages in march 1991. 

Initial results of the second IAEP have been reported in Learning Mathematics and Learning 
Science. 3 Reports of the geography and performance assessments were issued in June and July 
1992, respectively. 4 Mathematics and science results for Colorado were reported in May 1992 and 
geography results, in August 1992. 5 A technical report published in April 1992 describes the tools 
that were used to obtain these results, which was considered to be the first stage of data analysis. 6 

In a second stage of data analysis, scales for mathematics and science proficiency were 
obtained using what can be called a strong model-based psychometric strategy, item response 
theory. Item response theory utilizes a family of models that employ latent variables (i.e., variables 
that cannot be observed) that correspond to the dimensions of what is known as the "latent space". 
As mentioned in the introduction, a technical report published in November 1992 gives a detailed 
description of the tools used in this second stage. 7 

The present paper focus on a part of the second stage of data analysis: the work done 
regarding the creation of a unique scale for all the participating populations, i.e. creating reference 
populations, scaling methodology, linkage of 9 and 13-year-olds populations. 



ERIC 



3 Archie E. Lapointe, Nancy A. Mead, and Janice M. Askew. Learning Mathematics. Princeton, NJ; Educational 
Testing Service, 1992. 

Archie E. Lapointe, Nancy A. Mead, and Janice M. Askwv. Learning Science. Princeton, NJ; Educational Testing 
Service, 1992. 

^Stephen Lazer. Learning About the World. Princeton, NJ: Educational Testing Service, 1992. 
Brian McLean Semple. Performance Assessment: An International Experiment. Edinburgh, Scotland: Scottish 
Education Department, 1992. 

5 Ruth B. Ekstrom. Colorado: Meeting the Challenge in Mathematics and Science. Denver, CO: Colorado 
Department of Education, 1992. 

Ruth B. Ekstrom. Colorado: Meeting the Challenge in Geography. Dever, CO: Colorado Department of 
Education, 1992. 

6 Adam Chu, et al. IAEP Technical Report. Princeton, NJ: Educational Testing Service, 1992. 
7 Jean-Guy Blais, et al. IAEP Technical Report: vol.2. Princeton, NJ: Educational Testing Service, 1992. 
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Reference populations 



The presence of so many different and heterogeneous populations (over 20 for 9-year-olds 
and over 30 for 13-year-olds) makes the task of creating a common scale an interesting technical 
and theoretical problem. How can we create a scale with which we can reasonably compare 
different populations? This could be accomplished in different ways using different reference 
populations. In this study, no single population was chosen to serve as the reference population. 
Instead, all populations contributed to creating four combined reference populations, called 
"superpopulations", one f or each age group by mathematics and science combination. 

All superpopulations were initially formed by drawing random samples of 200 students 
from each of the participating populations.* At age 9, 2,800 examinees were retained for each of 
the mathematics and science reference populations. At age 13, 4,000 and 3,800 examinees were 
retained for mathematics and science populations, respectively. All the analyses were conducted 
with these four data sets, but a number of analyses were repeated using the full populations. This 
was the case for item parameter estimation and plausible values computation. Analyses of the 
superpopulations were conducted without weights (i.e., each sampled student had a weight of 
one). Analyses of the full populations were conducted using transformed weights that summed to 
1650. This transformation was necessary because some of the procedures could be affected by the 
number of examinees. 



The analyses were based on the items retained after the first stage of the data analysis (see 
the first technical report). Sixty-one items were retained for 9-year-old mathematics and 75 items, 
for 13-year-old mathematics. Fifty-eight items were retained for 9-year-old science and 64 items, 
for 13-year-old science. For some populations, there were additional items that had to be deleted 
due to local problems in the translation of printed material. The maximum number of items 
removed was three. 



8 Canadian provinces did not contribute directly to the reference populations. Instead 20C examinees were randomly 
sampled from a population that had previously been labeled "Canada" (see the first IAEP technical report) 

-6- 



Scaling methodology 



The Scaling M^*' 

The paragraphs that follow review the scaling model employed in the analysis of the IAEP 
data. The reader is referred to Mislevy (1991) and Mislevy, Beaton, Kaplan & Sheehan ( 1992) for 
an introduction to plausible values methods and a comparison with standard psychometric 
analyses, to Mislevy, Johnson & Muraki (1992) for additional information, and to Rubin (1987) 
for the theoretical underpinnings of the approach. 

The item response theory (IRT) scaling model used with IAEP is the 3-parameter logistic 
(3PL) model (e.g., «» Lord, 1980). This model is from a family of "latent trait" models which 
quantify examinees' tendencies to provide responses in a given direction (e.g., correct answers) 
as a f unction of parameters that are not directly observed. 

The fundamental equation of the 3PL is the probability that a person whose proficiency is 
characterized by the unobservable variable 6 will respond correctly to item j: 



P(X j =l|e,a j ,b j ,c j ) = c j + (l- Cj ) e^ce-bj) 

1 + e Da j (e-b j ) 

- Pj(6) 

where: Xj is the response to item j, 1 if correct and 0 if not ; 

aj, aj > 0, is the slope parameter of item j, characterizing its sensitivity to 
proficiency ; 

bj is the threshold parameter j, characterizing its difficulty ; 

cj , whereO< Cj <l, is the lower asymptote parameter of item j, reflecting the 
chances of a correct response from students of very low proficiency. 

In IAEP analyses, c parameters were estimated for multiple-choice items, but were fixed at zero for 
constructed response items. 



Under the usual IRT assumption of local independence, the probability of a vector x = <x, 
.... Xn ) of responses to n items is simply the product of terms based on the fundamental equation 
of the 3PL: n 

P(x 1 8,a,b,c)=ff [p j( e)]Xj [l-PjO)] 1 "^ ' 
j=i 

After x has been observed, this equation can be considered a likelihood function and 
provides a basis for inference about 6 or about item parameters. In IAEP, estimates of item 
parameters were obtained via a marginal maximum likelihood estimation procedure (see Bock & 
Aitkin, 1981) as implemented in Mislevy and Bock's (1982) BILOG computer program. 

Overview of Plausible V fl 1n Pg M-rtwH^. 

A detailed development of plausible values methodology is given in Mislevy (1991) Along 
with theoretical justifications Mislevy's paper presents some secondary analyses and numerical 
examples. Plausible values were developed as a means of obtaining consistent estimates of selected 
population features, and approximations of others that are no worse than those that would be 
obtained using standard IRT procedures. The following paragraphs give a brief overview of the 
approach. 



questioner R ™ reSP °" SeS a " SamP ' ed eXaminees ,0 back8round a » d 
queens. IflRT 6 value, were avaUable for all sampled examinees, it would be possible ,o 

compute a statistic ,(6, y) - sueh as a subpopulation sample mean, a sample percentile point or a 
sample vanance - to estimate a corresponding population quantity T. A function U(8 ») - e « 
a jackknife estimate - would 1* used to gauge sampling uncertainty. Because the 3 PL is a lateni 
vanable mode,, however, 9 values are no, observed even for sampled student, ,f enough 
responses are solicited from each student to provide a fairly precise estimate 8 of their 8 values 
values of .(8, y) and U(8, y) are reported as apprcximations of cotresponding .(8, y, and U(8, y) 

Following Rubin (1987), we can think of 8 as "missing date- and approximate 
t(8, y) by ,ts expectation given (x, y), the data that were observed, as follows: 
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t'(x,y) = E[t(e,y)| x, y] 

= jt(8,y)p(e| x,y)d6 



It is possible to approximate t* by using random draws from the conditional distribution 
P(9 I *i. yj) of each sampled student i. These values are referred to as imputations in the sampling 
literature, and as "plausible values" in I AEP. The value of 6 for any respondent that would enter 
in the computation of t is thus replaced by a randomly selected value from the conditional 
distribution for 0 given his or her responses to cognitive items (Xj) and background items (yi). 
Rubin (1988) proposes this process be earned out seveial times -- multiple imputations -- so that 
the uncertainty associated with imputation can be quantified. The average of the results of K 
estimates of t, each computed from a different set of plausible values, is a Monte Carlo 
approximation of the above integral; the variance among them, denoted by B, reflects uncertainty 
due to not observing 8, and must be added to the estimated expectation of U(8, y), which reflects 
uncertainty due to testing only a sample of students from the population. 

Plausible values are not test scores for individuals in the usual sense. They are offered only 
as intermediary computations for calculating integrals of the form presented above in order to 
estimate population characteristics, even though they are biased estimates of the proficiencies of the 
individuals with whom they are associated. 

Computing Plausible Values in IRT-hased Scales 

Plausible values for each respondent i are drawn from the conditional distribution p(8 I Xj, 
yi). This section describes how, in IRT-based scales, these conditional distributions are 
characterized and how the draws are taken. 



Using conditional independence we have: 
P (0 I Xj, yO a P( X j 1 6) p(8 I yi ) , 



where P(xj I 8 ) is the likelihood function for 6 induced by observing Xi (treating item parameter 
estimates as known true values) and p(8 I yi ) is the distribution of 8 given the observed value yi of 
background responses. 



ERIC 
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In the analysis of IAEP data, a normal (Gaussian) form was assumed for p(6 1 y;), 
with a common dispersion and with a mean given by a main-effects model for selected elements of 
the complete vector of background variables. The background variables included will be referred to 
as the conditioning varices 9 , and will be denoted yc. The following model was fit for each 
subject group for each age (i.e., mathematics and science for 9- and 13-year-olds): 

where e is normally distributed with mean 0 and dispersion 2; and r and 2 are the parameters to be 
estimated. Since the subject areas in IAEP were considered to have just one scale, r is a vector 
and 2 is a scalar. If we had decided to use subscales, then T would have been a matrix and 2 a 
covariance matrix. Like item parameter estimates, these estimates of conditional distributions were 
treated as known true values in subsequent steps of the analysis. Maximum likelihood estimates of 
T and 2 were obtained with Sheehan's (1985) M-Group computer program, using a variant of the 
EM solution described in Mislevy (1985). 

The conditional distribution, p(8 1 yj), has been assumed normal, with mean nf-Tyf and 
variance 2; if the likelihood, P( Xi I 6), is approximated by anoiher normal distribution, with mean 
Mi and variance 2j , then the posterior p(8 1 xj, yj) is also normal with variance: 

2f = (2 4 -K2iV )-* 
and mean: Gi = P? 2' 1 + 6p2p) (2f)- J . 

In the IAEP analysis, a normal approximation for P( Xj I 0) was accomplished for a given 
scale by the steps described below. These computations were carried out in the scale determined by 
parameters estimates from different runs of BILOG (Mislevy & Bock, 1982). 

1- Lay out a grid of Q equally spaced points from -5 to +5, a range that should cover the 
region of the scale for each population involved. The number of Q values varies from 20 to 
40, depending on the scale being used; smaller number of values should suffice for scales 
with few items given to each respondent, while larger number of values are required for 
scales with many items (such as in IAEP). 



^The conditioning variables used in IAEP analyses are presented in the second technical report appendix C The wav 
they were included in the analysis is described in a further section. appenoix l ne way 
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2- At each point X q , compute the likelihood Lfa 1 8 = Xq). 



3- To improve the normal approximation in those cases in which the likelihoods are not 
roughly symmetric in the range of interest -- as when all of an examinee's answers are 
correct -- multiply the values from step 2 by the mild smoothing function 



S(Xq) = 



exp(X q + 5) 



[1 + exp(X q + 5)] [1 + exp(X q - 5)] 



9 
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This is equivalent to augmenting each examinee's response vector with responses to two 
fictitious items, one extraordinarily easy item that everyone gets right and one 
extraordinarily difficult item that everyone gets wrong. This expedient improves the normal 
approximation for examinees with flat or degenerate likelihoods in the range where their 
conditional distributions lie, but has negligible effects for examinees with even modestly 
well -determined symmetric likelihoods. 

4- Compute the mean and standard deviation of 6 using the weights S(X q ) from step 3 

At this stage, then, the likelihood created by a respondent's answers to the items in a given 
scale is approximated by a normal distribution. This normalized-likelihood normal posterior 
approximation is then employed in both the estimation of T and 2 and in the generation of plausible 
values. From the final estimates of T and 2 , an examinee's posterior distribution is obtained from 
the normal approximation using the four-step procedure outlined above and a plausible value is 
drawn at random from this univariate normal distribution. 

Even though we do not observe the 6 value of examinee i, we do observe variables that are 
related to it: Xi, the examinee's answers to the cognitive items, and y it the respondents answers to 
demographic and background variables. Suppose we wish to draw inferences about a number 
T(9, Y) that could be calculated explicitly if the 6 and y values of each member of the population 
were known. Suppose further that if 6 values were observable, we would be able to estimate T 
from a sample of N pairs of 0 and y values by the statistic t(8, y) [where t(8, y) * (9j, y lt ...,e N , 
yN)], and that we could estimate the variance in t around T due to sampling respondents by the 
function U(8, y). Given that observations consist of (xj, yi ) rather than (0 it yi ), we can 
approximate t by its expected value conditional on (x, y), or (as previously seen): 
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t* (x,y) = E[t(8,y)| x,y] 

= Jt(8,y)p(8| x t y)d8 



It is possible to approximate t* with random draws from the conditional distributions p(8i I 

*i> yi)> which are obtained for all examinees by the method describe above. Let 8 m be the m th 
such vector: of the "plausible values," consisting of a value for the latent variable of each 
examinee. This vector is a plausible representation of what the true 6 vector might have been, had 
we been able to observe it. The following steps describe how an estimate of a scalar statistic t(8, y) 
and its sampling variance can be obtained from M (> 1) such sets of plausible values. 10 



1- Using each set of plausible values 8 m in turn, evaluate t as if the plausible values were 
true values of 8. Denote the results tm , for m = 1, ...,M. 



5- Compute the variance among the M estimates t^ to approximate uncertainty due to not 
observing 0 values from respondents: 



2- Using a variance estimation procedure, compute the estimated sampling variance of tm 
denoting the result U m . 



3- The final estimate of t is: 





M 



ftn - 0 2 



M-l 



10 Five sets of plausible values were used in each IAEP analysis and are provided on the IAEP public-use data tapes 
for secondary analysis. 
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6- The final estimate of the variance of t* is the sum of two components: 



V = U* + (1 + M-1)B M . 

The first term, U* is related to the sampling variance and, in IAEP, it is estimated through a 
resampling procedure call the jackknife. Briefly, we can say that this procedure computes a 
statistic with the full data set and computes the same statistic a certain number of times, taking into 
account the sampling plan, deleting each time some of the data and replacing them with contiguous 
data (thus creating so-called pseudo data sets). The standard error is then estimated using the sum 
of the squared differences between estimates with the full data set and estimates with the pseudo 
data sets. The second term, (1 + lvH) B M , is related to measurement error. It is the estimate of the 
uncertainty due to not observing 6. It is computed as the sum of the squared differences between 
the mean using each plausible value and the mean of the plausible values means. These two 
components are combined as shown above to form a more realistic standard error estimate. 

Estimating variability requires computing a statistic 165 times, including 33 computer runs 
to obtain an estimate and a variance estimate from each of the five sets of plausible values used in 
IAEP analyses. Because the cost of the full procedure was prohibitive, an approximate procedure 
was used to produce reasonable estimates at lower costs. We estimated t on each pseudo-data sets 
(in order to estimate variability due to tl latency of proficiency) but computed its jackknife 
variance on only one pseudo data set to estimate sampling variability. 

The invariance principle nf TRT 

Item response theory comes from the work of Ferguson (1942), Lawley (1943), Lord 
(1952) and Rasch (I960). This modelization proposal rests on the hypothesis that there exists a 
relation between the probability of obtaining the observed result for a given item and the non- 
observable ability (one or many) aimed at in the measurement process devised. It is inspired by 
statistical regression and by factor analysis. It supposes the existence of non-observable elements, 
traits, abilities, factors, influencing performance and observed results. 

Modelizing in the statistical regression framework, as we find it in I.R.T., gives a 
theoretical property to the mathematical model's parameter: the invariance property. Under some 
conditions, estimates of items' parameters are independent of the group of examinees with which 
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the measurement is done, and examinees' ability is independent of the items included in the 
measurement process (see Blais & Ajar, 1992; Blais & Ajar, 1993). 

The invariance property is almost mythical because its empirical demonstration is 
remarkebly lacking in published studies, rendering it suspect on that account (Wood, 1976). 
There's been a lot of confusion about it in I.R.T.'s applications and Wood (1976) said that it was 
one of the grayest concept in general test theory. In some presentation of the invariance property, 
it is suggested that the model guarantees invariant estimation, that invariant estimations can be 
obtained with any group of examinees or items without empirical investigations. As if I.R.T. 
model spared us from worrying about anything any more. Fortunately of course, reality is 
somewhat more complex. Most of the time there's far to go before counting the chickens. 

Before getting into conditions for the invariance property to hold and into discussion of 
elements that could tamper it, we must place things in a general context. To do so, we will expose 
the idea of statistical regression as we find it in some statistics textbooks (for example, Cramer 
1946, p. 270-272), the special case of linear regression and how it can be formulated in the 
framework of I.RT. 

Let X and Y be two continuous random variables and /(x,y) their joint probability function. 
If we think of Y as a dependent variable and of X as an independent variable, then we can write 
/(ylx), the probability function of y given x. 

For a given value of X, say x, tne Y variable can take many values y. A possible 
representative of these values could be the expected mean of Y given the value x taken by X: 
E(YIX=x)=^ yIx . 

When x varies, the point [x,E(YIX=x)] describes a curve in a two-dimensional space. A 
curve like this is called a regression curve, it is said to represent the regression of Y on X. 

The regression of Y on X is independent of the x's distribution, it is invariant from one 
group of x values to any other one. 

If we suppose a linear relationship between Y and X, we can represent it by the equation Y 
= aX + b + e, where e is an error variable. If Y and e are random variables and E(e)=0, then 
E(YIX) = aX + b, i.e. the regression of Y on X is given by aX + b. 
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When for estimated values of a and b, the hypothesis of a linear relationship can be 
confirmed, the established relation will be the same whatever values of X are considered. In other 
words the invariance property will hold for the a and b parameters. 

The parallel with I.RT.'s modelization can be illustrated with the case where the notation is 
dichotomous. 

Let the result (0 or 1) to a given item j be the variable Uj. P/q) can be the probability of a 
correct result, noted 1, given an ability q: Pj(q) = P(Uj=llq). And Qj(q) the probability of an 
incorrect result, noted 0, given an ability q: Qj(q) = P(Uj=0lq) = 1 - P(U =llq). 

Let's suppose that the probability function of Uj is of the Bernoulli! type, then: 

\ Qj(e),uj = o 

The regression of the observed result, for item j, on the ability q is given by: 

E(U j lq) = [P j (q)xl] + [Q j (q)xO] 
= Pj(q). 

We called the regression of the observ ,d result Uj, for item j, on the ability q, the 
characteristic curve of item j: I.C.C. The characteristic curve is invariant for any distribution of 
the ability variable q. If Pj (q) is in the form of a two-parameter logistic model, i.e. one parameter 
q for the examinees and two parameters (a, b) for the items, since the I.C.C. is invariant the 
parameters are, theoretically, also invariant. 

For the property to hold empirically, the modelization must meet some conditions. 
Certainly, an important one is that there must be some appropriateness between the data and the 
model. Like for any regression model, the invariance property holds only if we can demonstrate 
that the model fits the data reasonably well. The main difficulty resides in defining what is meant 
by reasonably well. 

From the modelization point of view, goodness of fit is not the only concern that must get 
the practitioner's attention, even if it is an important point. At the initial calibration of items the 
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road that leads to parameters 1 estimation for our model, we must have some thought about the 
desired level of generalization we are aiming at for our results. In direct link with our measurement 
objectives we must have considerations for the way we sample the examinees. Not that we must 
use a specific probabilistic sampling plan -we are in the regression framework- but the sample must 
be conveniently heterogeneous (Hambleton and Swaminathan, 1985, p. 13). The task we have to 
tangle with is to give a practical meaning to the word conveniently. 
* 

In a new measurement situation, with items previously calibrated, invariance will hold only 
if the new examinees being tested have similar characteristics as those in the group that was useu 
for the initial calibration of items (Lord & Novick, 1968 p. 360). We must then take into 
consideration the heterogeneity wanted in the initial calibration if we want to collect useful 
information that will help us explore the promise of invariance with future examinees' samples. 

When there is an interaction between a group of examinees and some given items, so that 
items have a different meaning for different groups (a sort of bias), the invariance property will not 
hold any more. 

Scaling 

First, let us review the entities we were dealing with in this scaling analysis. As described 
in a previous section, three elements contribute to a population's mean proficiency estimate: 
supposedly known item parameters, known answers to background items, and known answers to 
cognitive items. Thus, for each analysis, there were three data sets to be considered for each 
subject area and age group. 

The information that is known can be used directly. The content of the cognitive items is 
documented in the first IAEP results report published in February 1992. All the cognitive items 
included in the percents correct analyses in the first stage of data analysis were used for the IRT 
analyses. 

The background- variables used for computing plausible values were not used directly. 
Since we are dealing almost exclusively with qualitative variables, and since plausible values 
methodology can be seen as a kind of regression analysis, all background variables were 
transformed into "dummy" variables. These variables took the form of a series of orthogonal 
comparisons which described the various categories of each background variable. These variables 
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Figures 7 and 8 show fitted regression lines of the two proficiency scores. (Individual 
population proficiency scores were regressed against the reference population proficiency scores.) 
The graphics presented at the top and bottom of Figure 7 are for mathematics 9- and 13-year-olds, 
respectively. The graphics presented at the top and bottom of Figure 8 are for science 9- and 13- 
year-olds, respectively. 



Insert figures 7 and 8 about here 

These last figures indicate that the proficiency distributions estimated from the reference 
populations cannot be considered as having the same mean and standard deviation as the 
distributions estimated from the individual populations. The proficiency estimates are highly 
related, with correlations of over 0.98, but the midpoint and scale of the proficiency distributions 
are different. The midpoint and scale of a proficiency distribution is arbitrary. To compare the 
proficiency distribution directly from population to population, they must be put onto the same 
metric. This is accomplished by equating. 

The scales that result from separate IRT scalings are not typically comparable, even if the 
same set of items are used in each of the scalings. The origin and scale units of the provisional 
scale for each of the individual population's scaling and for the reference population's scaling were 
established by setting the ability distribution of each of the respective calibration samples to a mean 
of zero and a standard deviation of one. Because all participating populations were used in the 
reference population's item calibrations, the origin and scale unit for the reference populations were 
based on the sum of the individual populations' ability distributions. In contrast, the origin and 
scale unit of the individual populations' scales were based on an ability distribution of each single 
population. Clearly, without a transformation, the metrics for the individual populations are not 
comparable to one other or to the metrics of the reference populations. Consequently, additional 
procedures were employed to ensure all scores were reported on the same metric. 

The next step then was to put the item parameters estimates coming from the individual 
populations and those coming from the reference population for a given subject area and age group 
on the same metric. This was done by equating the item parameters estimated from the individual 
populations to those estimated from the reference populations (see Section for documentation of the 
equating procedures). After the equating was completed, proficiency scores were recalculated 
using equated estimated item parameters and compared to proficiency scores computed from item 
parameters estimated from the reference populations. 
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Figures 9 and 10 illustrate what happened to previous regression lines when equated item 
parameters estimates were used instead of those that were not equated. The lines are much more 
homogeneous and, except for one population, 13-year-old mathematics, we could say with 
confidence that equated item parameters estimates and those coming from the reference populations 
were on the same metric. Moreover, the rank orders of the mean proficiency scores for 
populations based on the equated ICCs and those based on the reference population ICCs are the 
same (if standard errors are taken into account). These results indicate that no information has 
been lost as a result of using one set of item parameter estimates over another. We, therefore, 
decided to use the overall estimates based on the reference populations and five plausible values 
were computed for each examinee from each population using Sheehan's M-Group program (the 
mainframe version). 



Insert figures 9 and 10 about here 

To assess the relationship between proficiency scores and previously presented percent 
correct scores, correlations were computed between each plausible value and mean percent correct 
score for each population. The results are presented in Tables 3 to 6. As we can see from these 
tables, the correlation between the mean percent correct and the mean of the plausible values 
(column labeled CORR) and each individual plausible value (columns labeled PI to P5) were quite 
high. We can also observe that the rank order based on mean percent correct scores (column 
labeled %) corresponds to rank order based on mean proficiency scores (column labeled PROF). 
The mean proficiency scores are presented on a scale with a mean of 500 and a standard deviation 
of 100. The next paragraphs will describe the transformation applied to the original proficiency 
scale (« [^3.00, 3.00]). 



Insert tables 3 to 6 about here 



To be able to perform a linear transformation of an existing scale, one has to know what are 
its initial mean and standard deviation and the targeted mean and standard deviation. For IAEP, the 
target mean and standard deviation were fixed at 500 and 100 respectively. These values were 
chosen mainly for reporting convenience. 

The initial mean and standard deviation had to be calculated from the existing data. In 
creating a common scale using all the participants we decided that the initial mean and standard 
deviation would be calculated using the weights of each examinee in each populations for a subject 
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area. After equating the results of the 9- and 13-year-old populations for a given subject area 
(mathematics or science) 11 , all the examinees in these two age-groups were put together and initial 
values were calculated. The following summarizes how this was done. 



Let Xe t andS 0j stand for the mean and standard deviation of the initial scale and Xe 2 and 

2 

S 02 stand for ihe mean and standard deviation of the targeted scale, respectively. Then 8^ a 
plausible value on the initial scale, is transformed to 82 , a plausible value on the transformed 
scale, by calculating: 



e 2 = 



P 1 < Ql - x *i) 

b 6i 



+ 500 



Finally, the mean proficiency score was calculated for each population. Tables 8 and 9 present 
some of the results. 



Column one is a population ID. (The asterisk beside the standard error indicates a 9-year- 
old population.) Column two gives the mean proficiency score for each population. Column three 
gives the standard error of the mean proficiency score. Remember that this standard error takes into 
account sampling and imprecision in measurement The next eleven columns gives information on 
the distribution of scores. They provide the first, tenth, twentieth, thirtieth, fortieth, fiftieth, 
sixtieth, seventieth, eightieth, ninetieth, hundredth percentiles of the distribution, respectively. for 
each population. 



Insert tables 8 and 9 about here 



Linking 9- and 13-year-olds populations 

A small set of common items were included in the 9- and 13-year-olds assessments in each 
subject area. (Fourteen of these common items in each subject were retained after the first stage of 
data analysis.) This design element provided the possibility of linking 9- and 13-year-olds results 
into a single scale within each subject area (i.e., mathematics and science). 



^This part of the analysis is documented in the following section. 
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The item parameters of the common items were estimated independently using the 9-year- 
old reference population and 13-year-old reference population. These item parameter estimates 
were different because the scales defined by each independent calibration of the items were 
different. In order to merge the two age group proficiency scores, we had to make sure they were 
expressed on the same scale. There are a number of methods for transforming item parameter 
estimates from one scale to another scale (see Stocking & Lord, 1983). 

Two procedures for transforming IRT results to a common scale are common items 
equating and equivalent populations equating. The common items equating procedure is used to 
equate two scales that contain a set of common items that were administered to independent 
samples drawn from different populations. This was the case for the 9- and 13-year-old IAEP 
populations. 

The procedure that was used to estimate the transformation for linking the two age groups 
was the Stocking-Lord procedure (Stocking & Lord, 1983) a* implemented in the TBLT computer 
program (Stocking, 1986). 

The input data for the Stocking-Lord procedure consists of two sets of estimated item 
parameters, one set expressed on a target scale and one set expressed on a provisional scale. In the 
IAEP study the 13-year-old scale was chosen as the target scale and the 9-year-old scale as the 
provisional scale. The output of the Stocking-Lord procedure are the parameter estimates, denoted 
here by A and B, based on a linear transformation that describes the relationship between the IRT 
item parameter estimates expressed on the provisional scale and those expressed on the target scale. 
That is, 

aj = A' 1 aF 
bj = A b[ + B 
Cj = cP 

where (aj\ b?, cP) and (aj, bj, q) for j = l,...,n are IRT parameter estimates obtained for the 
common items expressed on the provisional and target scales, respectively. Note that the lower 
asymptote parameters c? are unaffected by the transformation. 
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The parameters of the linear transformation, A and B, are found by minimizing the squared 
difference between estimated true scores (expected numbers correct on the n common items) at N 
preselected proficiency values. 6 = [6i,...,8n]. The function that is minimized is : 



f(A, L) = 1 / N £ fe T (l, 0, 90 -£ P (A, B, 90 ) 

i=l 



where £ T (1, 0, 0i) is the estimated true score associated with the proficiency level 6j, calculated 
from the item parameters expressed on the target scale, and £ P (A, B, 60 is the estimated true score 
associated with the proficiency level 0i calculated from the item parameters that were originally 
estimated on the provisional scale and then re-expressed on the target scale. That is, 

£ P (A,B,ei)= £c j + ] ^ 

j=i ( 1 + exp [ -IJVL-hfM - (Abf + B))][ 

where a? and b? are the estimated discrimination and difficulty parameters for item j, expressed on 
the provisional scale. The values 6 = [8i,...,6n] are typical selected to span that region of the 
target scale which is expected to be the most dense. 

The transformations were obtained using Stocking (1986) TBLT program. The equated 
item parameters estimates are presented in Tables 9 to 14. 



Insert tables 9 to 14 about here 



The transformation used for putting 9-year-olds (b?) item parameter estimates on to the 13- 
year-olds scale, for mathematics was 

TARGET = (1.076767 x PROVISIONAL) + (-1323902), 
and for science, 

TARGET = (1.116443 x PROVISIONAL) + (-1.350289). 

Of course, obtaining such equations does not mean that estimates are identical. The quality 
of the linking procedure must still be checked. 

For mathematics, we can see by looking at Tables 9 to 1 1 and Figures 1 1 to 13 (where 9- 
year-old estimates are on the horizontal axis and 13-year-olds are on the vertical axis) that the main 
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problem lies with the difficulty parameter of one item, item 30. In Figure 12, this item is the point 
at the extreme right (0.43, -0.62). Otherwise, even if values do not fall exactly on a straight line, 
we can consider that parameters at each age are similar. (Remember that the c values were not 
equated.) Correlation between estimates of the a and b parameters were 0.67 and 0.80, 
respectively. 

Insert figures 1 1 to 13 about here 

For science, we can see by looking at Tables 12 to 14 and Figures 14 to 16 there are two 
items which present a problem, item 26 and item 27. These are the same items that are singled in 
the next chapter on the item anchoring procedure. Correlation between estimates of a and b 
parameters were 0.67 and 0.73, respectively. 



Insert figures 14 to 16 about here 



What should be done with these items? Ideally they should be removed and the equating 
redone. However, the number of common items in the IAEP assessment were limited, and the 
performance of the linking procedure is at least partly affected by the number of items included in 
the linking. This is because the procedure is sensitive to uncertainty due to model misfit, which 
becomes more severe as the number of linking items decreases (Sheehan, 1988). In a sense, as the 
number of items increases, the effect of a few peculiar items should decrease. Because we felt that 
mean proficiency score estimates would be robust to the presence a small number of non "ideal" 
items in the equating procedure and we wanted to keep as many items as possible (content 
coverage being important), the decision made was to keep all the items and go ahead with the 
linking of the 9- and 13-year-olds scales, keeping in mind that there is always some uncertainty 
due to any linking procedures (as well illustrated by Sheehan, 1988). 



Conclusion 

Results show that even if on an item by item basis the equating gives results that are not 
"ideal", when all the items are taken into account (i.e. the test format) the students proficiency 
scores estimated with both sets of item parameters estimate (from the reference populations and 
from the "equated" individual populations) can be considered to be on the same scale. Moreover, 
the rank orders of the mean proficiency scores for populations based on the equated ICCs and 
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those based on the reference population ICCs are the same (if standard errors are taken into 
account). These results indicate that no information has been lost as result of using one set of item 
parameter estimates over another 



Implications for applied research and educational data analysis could be important. When 
working with multiple heterogeneous populations, one can choose to work with a reference 
population that is a mixture of individual populations. In doing so, there should be no loss of 
information if one is working on an aggregated measures basis (test scores or population' mean 
test scores). However, one has to be careful when working with individual items (like in 
computerized adaptive testing), since there could be some important discrepancies in between 
individual population equated item parameters estimate. 
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Table 1 Mathematics, age 9, items 1 parameters estimates and standard 
errors 
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11-1M3 


GEOPS 


0.84981 


0 02653 

VJ. UlUJ J 


ft 5Q8Q1 

VJ. J707 1 


0 000 5 
U.UZZ j 


rv 
U 


n 
0 


12-1M3 


ALG-PS 


1.00732 


0 08466 

VJ. VJO^+VJVJ 


ft 39661 

VJ. J Z.VJQ 1 




rv /i i O 1 1 
U.41Z1 1 


0.02206 


13-1M3 


NUM-CU 


1.22822 


ft ft7545 

VJ. VJ / JtJ 


ft 9Q677 
VJ. Z70 / / 


rv rvo q>i c 


rv o^o ^ >1 

0.26364 


t\ m to rv 

0.01799 


14-1M3 


NUM-PK 


0.892 


ft 04846 

VJ. VJ'tO'tVJ 


-0 dOOd^ 
-vj.H-UUH-j 


rv rvT inn 

u.u / iyy 


rv 'w//n 

0.22669 


rv rv o rvrv 

0.0309 


15-1M3 


NUM-CU 


0.98494 


ft ft559Q 

VJ. UJ 


ft 33756 
U.jj / jo 


fi rv >i Oil 
U.U43 1 1 


rv 1 /T T"7 ^> 

U. 16772 


rv rv i o i o 

0.01918 


16-1M3 


NUM-PK 


1.5501 


ft ft8653\\ 

VJ. VJOVJ J J \ 


l U.H-U / jj 


rv rvOiCTo 
U.UZo /o 


0.2001 


0.01371 


1-1M4 


GEO-CU 


0.93718 


ft ft5464 


\ 1 CfiQCQ 


rv 1 rvTno 
U. 10792 


0.32588 


0.04636 


2-1M4 


NUM-PK 


1 141 08 

1. 141 UO 


ft ft581 3 


\\\ rv n a c c 
VA -U. /4jj 

V>-^3_0 6061 1 
^^^U.OZOl 1 


rv rvcooo 


0.21197 


0.03083 


3-1M4 


NUM-PK 


1 22652 


ft ft53ft3 

vJ.UJjvjj 


rv f\AH 1 

U.U4 /lo 


0.19931 


0.02537 


4-1M4 


DAR-PK 


0.9241 


ft ft5353 

VJ. UJJJJ 


1 fi7Q7C 
- l.U IDID 


rv rvnn i i 

u.uyy 1 1 


0.35635 


0.04004 


5-1M4 


DAT-PS 


0 98672 


ft ft5838 

VJ. UJOJO 


C\ Q76Q7 
U.j /Do / 


rv rv>i 0 a 1 
U.U4Z41 


0. 16773 


0.0187 


6-1M4 


DAT-PK 


1.16635 


ft ft 583 

VJ. VJ JO J 


-ft 443 Q6 


rv rvc 1 1 C 
U.Uj 1 1 j 


rv oo o c >< 
U.23954 


rv rvr* r f\ 

0.02506 


7-1 M4 


MEA-PK 


0.62855 


0 04601 

V. \J"T\JVJ X 


-ft 371 96 

VJ. J 1 1 ^0 


u. 1 zy j h- 


rv o rvn/T 
U.3U / /O 


n no Oiio 

0.03942 


8-1M4 


NUM-PK 


1.11387 


0.05929 


-0.07947 


0.04527 


0.22027 


0 02104 

U.Uti X \J^ 


9-1 M4 


ALG-CU 


0.34832 


0.01831 


-1.73093 


0.09605 


0 


0 


10-1M4 


NUM-PK 


0.60655 


0.0216 


-0.76071 


0.03448 


0 


0 


11-1M4 


NUM-PS 


0.90445 


0.02645 


-0.1567 


0.0194 


0 


0 


12-1M4 


NUM-CU 


1.04991 


0.03073 


0.50936 


0.01824 


0 


0 


13-1M4 


MEA-PK 


0.86948 


0.02556 


0.45126 


0.02103 


0 


0 


14-1M4 


MEA-CU 


0.91701 


0.02727 


0.24468 


0.01881 


0 


0 


15-1M4 


NUM-PS 


1.34951 


0.07053 


0.16408 


0.03212 


0.20245 


0.01628 


'16-1M4 


GEO-CU 


0.95731 


0.06168 


0.52416 


0.04504 


0.20675 


0.01847 
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Table 2 Mathematics, age 13, items' 
standard errors 





ITEM LABEL 


SLOTE 


SE 


1 OX/ T 


GEOCU 


0.53152 


.0.02701 


O OX/1 

2-2 Ml 


i t xt nrx 

ALG-PK 


1.52416 


0.05035 


3-2M1 


V TT T\ X X"»» 1 

NUM-CU 


0.77062 


0.03108 


4-2 Ml 


MEA-CU 


0.66237 


0.03307 


5-2M1 


XTT T\ / XTT T 

NUM-CU 


0.59752 


0,03487 


6-2M1 


NUM-PS 


0.70937 


0.03162 


7-2M1 


NUM-PS 


1.17333 


0.05522 


8-2M1 


NUM-CU 


1.39448 


0.08895 


9-2M1 


MEA-PS 


1.2339 


0.04137 


1 a OXvf 1 

1U-ZM1 


DAT-FK 


0.54627 


0.01585 


1 1 OX/M 

1 1-ZM1 


r*v A T* TUT 


XV f^X"0*T 

0.96275 


0.02022 


1 O OX/f 1 

1Z-ZM1 


DA I-rK 


0.8056 


0.01767 


1 Q OXvf 1 

U-ZMl 


xrr T 


1.06263 


0.02194 


14-2 Ml 


ALG-PS 


0.91098 


0.02172 


1 C OX/1 

15-ZM1 


* T xi TTfX* 


1.60427 


0.05707 


1 £. OX /1 1 

1D-2M1 


DA I-rK 


1.26213 


0.04681 


1 H OX/f 1 

1 /-ZM1 


VTT IX / FJTX" 

NUM-PK 


1.18523 


0.05278 


1 O OX/ 1 

lo-ZMl 


MEA-PS 


1.47208 


0.06709 


1 O OX/1 

iy-2Mi 


MES-PK 


•a x-\ j\ *\ *\ m 

1.80225 


0.09874 


1-12M2B 


NUM-PS 


0.73187 


0.02993 


2-12M2B 


NUM-PS 


0.83575 


0.03072 


O 1 OX / OT3 

3-12M2B 


NUM-PS 


0.66709 


0.0432 


4-12M2B 


ALG-PS 


0.59672 


0.03163 


C 1 OX /OT3 

3- 12M2B 


NUM-PK 


0.56451 


0.03162 


O-lZMZt) 


r\iT no 

DAT-PS 


0.67999 


0.02609 


*7 i ox/on 
/-IZMZJd 


V — T\ / XTT T 

NUM-CU 


1.03708 


0.05077 


© iox/on 


VTT TV / no 

NUM-PS 


0.99587 


0.03724 


9-12M2D 


DAT-CU 


0.82189 


0.03403 


1 A 1 OX jffOD 

1U-12M2D 


XTT TV / TMX' 

NUM-PK 


0.99254 


0.02045 


11 1 OX/OD 

1 1-lZMZb 


VTT TV / XTT T 

NUM-CU 


1.11276 


0.05668 


1 O t OX /OT3 

12-12M2B 


VTT TV / XTT T 

NUM-CU 


1.24063 


0.04485 


1 Q i OX/OD 


ALG-PS 


0.87039 


0.04242 


1 A 1 OX iOD 

14-12M2B 


GEO-CU 


1.07588 


0.0475 


15-12M2B 


MEA-PS 


1.2911 


0.04336 


IX" 1 ^\ X^TT, 

16-12M2B 


ALG-CU 


1.30808 


0.04722 


i *7 i ox /on 

17-12M2B 


GEO-PS 


1.04501 


0.04233 


1 O 1 OX /ID 

lo-12M2b 


ALG-PK 


1.28474 


0.05651 


i q i oXvfon 
ly-lZMZJB 


VTT !X / r"*X T 

NUM-CU 


1.23464 


0.0533 




xn rx / c*\ 1 
NUM-CU 


0.9243 


0.0488 


O OWQ 
Z-ZIVl J 


vtt rv / Tytr 
In UM-rlv 


0.74802 


0.03835 




Hat n t 


U.4541 


0.02393 




McA-CU 


1.07484 


0.03087 


C OX/Q 




U. 64575 


0.04069 




at n TXS 


1 yiO CO O 

1.42528 


0.0478 


*7 OX/O 


Oo>rb 


1.03709 


0.0367 


Q OX/Q 

o-/IV1j 


xn rx / r*\ i 
NUM-CU 


1.00965 


0.05687 




X JtT~ A DO 

MbA-rb 


1.52737 


0.07601 


1 U-ZM3 


ALG-PK 


1.29564 


0.0258 


1 1 OXyfQ 

1 1-ZM3 


MEA-PK 


0.7881 1 


0.01754 


1 1 OXyfQ 

13-ZM3 


x /T" a rYTx - 

MEA-PK 


1.11645 


0.02295 


1 A OX A 1 

14-ZM3 


GEOPK 


1.11033 


0.05179 


1 < ox/o. 


VTT TV / XTT T 

NUM-CU 


1.3071 


0.05039 


1 d-ZMj 


at/-* r>o 

ALG-PS 


1.71041 


0.05515 


1*7 OXXI 

1 /"ZIvu 


atp r>o 


0.85475 


0.03878 


1 0-ZIVL5 


XTC A DO 

MJbA-ro 


1.55746 


0.07126 


19-2M3 


x/p a no 


i. Muyz 


0. 10385 


1-2M4 


DAT-CU 


0.5586 


0.0264 


2-2M4 


NUM-PK 


1.36225 


0.04727 


3-2M4 


NUM-QJ 


1.0719 


0.0561 


4-2M4 


NUM-PK 


1.07789 


0.04287 


5-2M4 


GEO-PK 


0.85501 


0.03016 


6-2M4 


NUM-CU 


1.3016 


0.04317 


7-2M4 


ALG-CU 


1.62637 


0.0731 


8-2M4 


DAT-PK 


1.26523 


0.04181 


9-2M4 


MEA-CU 


1.25752 


0.06774 



parameters estimates and 



TRESHOLD 


. SE 


GUESSING 


SE * 


-1.03639 


0.14854 


0.21089 


. 0.04779 


-0.31848 


. 0.02463 


0.16824 • 


0.01364 


-0.78654 


. 0.07639 


0.17051 


0.03288 


-0.16034 


0.07958 


0.19393 


0.02778 


-0.36151 


0.11255 


0.22227 


0.03656 


-0.96756 


0.09884 


0.18841 


0.04108 


0.69702 


0.02545 


0.22404 


0.0102 


1.17337 


0.02702 


0.31621 


0.00799 


0.43553 


0.01877 


0.07072 


0.00841 


-1.15737 


0.03524 


0 


0 


-0.14431 


0.01472 


0 


0 


-0.30767 


0.01771 


0 


. 0 


-0.09168 


0.01364 


0 


0 


0.84579 


0.01836 


0 


0 


-0.33532 


0.02445 


0.15037 


0.01459 


0.06054 


0.02739 


0.16437 


0.01374 


0.09199 


0.03303 


0.19922 


0.01605 


0.50954 


0.02251 


0.21778 


0.01072 


1.13716 


0.01923 


0.19567 


0.00683 


-1.44386 


0.10678 


0.20564 


0.0476 


-1.66592 


0.08896 


0.18289 


0.04655 


-0.14352 


0.10024 


0.3104 


0.03102 


-0.41038 


0.10368 


0.18696 


0.03493 


-0.56463 


0.12584 


0.20861 


0.04049 


-1.45374 


0.10456 


0.18137 


0.04494 


-0.58454 


0.061 


0.30164 


0.02651 


-0.38028 


0.04429 


0.17997 


0.02025 


-0.33349 


0.05725 


0.17542 


0.02384 


-0.74667 


0.01791 


0 


0 


-0.62607 


0.06221 


0.37623 


0.02581 


0.23263 


0.02618 


0.2069 


0.01175 


-0.60479 


0.0741 1 


0.24596 


0.03094 


-0.53731 


0.053 


0.24811 


0.02488 


0.35917 


0.02007 


0.09908 


0.00954 


0.16811 


0.02501 


0.1734 


0.01218 


0.06249 


0.03601 


0.19126 


0.01607 


0.5887 


0.02388 


0.21077 


0.01048 


0.90127 


0.02077 


0.12741 


0.00798 


-0.74871 


0.08205 


0.32415 


0.03292 


-1.00432 


0.1131 


0.28807 


0.04331 


-2.40838 


0.20256 


0.23125 


0.06048 


-0.00394 


0.02274 


0.05363 


0.01045 


-0.74695 


0.13943 


0.31931 


0.0439 


0.31203 


0.01867 


0.12614 


0.00895 


-0.03512 


0.031 


0.12964 


. 0.01418 


0.75797 


0.03161 


0.24858 


0.012 


0.27018 


0.02708 


0.35802 


0.01154 


-0.32567 


0.0123 


0 


0 


-0.1 1313 


0.01647 


0 


0 


0.19065 


0.01276 


0 


0 


-0.03915 


0.04108 


0.30152 


0.01698 


0.16072 


0.02434 


0.17517 


0.01187 


0.43638 


0.01421 


0.09764 


0.00668 


0.10066 


0.04749 


0.18707 


0.01901 


S\ xs/-^ xx* 

0.86763 


0.01874 


0.16945 


0.00758 


1 .UUo I L 


f\ r\o i a i 
U. 02 147 


0.2686 


0.00802 


-1.45686 


0.15091 


0.21829 


0.05256 


-0.81677 


0.03814 


0.19853 


0.02174 


0.38773 


0.03704 


0.31409 


0.01396 


0.04027 


0.03283 


0.16783 


0.0151 


-1.16239 


0.06939 


0.14685 


0.03458 


0.59519 


0.01652 


0.06197 


0.00676 


0.7426 


0.01913 


0.25316 


0.0078 


-0.50717 


0.03285 


0.16329 


0.01716 


0.55854 


0.031 


0.37872 


0.01117 



10-2M4 


ALG-PK 


0.84411 


0 01011 

\J. ULu f & 


11-2M4" 


GEO-CU 


1.246 


0 OldQ S 


12-2M4- 


•NUM-PK 


0.53821 


0.01469 


13-2M4 


. GEO-PS 


1.12126 


0.02335 


14-2M4 


ALG-PK 


1.18849 


0.02528 


15-2M4 


-NUM-PK 


1.13158 ' 


0.02457 


16-2M4 


GEO-PK 


1.07095 


0.02358 


17-2M4 


NUM-PS 


1.07142 


• 0.05001 


18-2M4 


MEA-PS 


1.81036 


0.09149 


19-2M4 


ALG-PS 


1.24096 


0.05742 



-1 109^9 


0 09 £R4 




u 


-0 20851 




0 


u 


0.19705 


0 02275 


0 




-0.33497 


0.01398 


o 


0 


0.09036 


0.01241 


0* 


0 


0.55491 


0.01392 


0 


0 


0.30392 


0.01369 


0 


0 


0.43645 


0.03227 


0.21714 


0.01353 


0.84193 


0.0194 


0.262 


0.00814 


0.63267 


0.02462 ■ 


0.1774 . 


0.01057 



Table 3 Mathematics, age 9, correlations between percent correct 
scores and plausible values 



POP 


% . 


PROF 


CORR 


PI 


P2 


P3 


P4 


P5 


1 


61.6 


409.8 


0.97 


0.94 


0.94 


0.94 


0.94 


0.94 


2 


60.2 


405.2 


0.98 


0.94 


0.94 


0.94 


0.94 


0.94 


3 


58.6 


400.4 


0.98 


0.94 


0.95 


0.94 


0.95 


0.95 


4 


68.3 


434.6 


0.97 


0.94 


0.94 


0.94 


0.94 


0.94 


5 


58.9 


399 


0.98 


0.95 


0.95 


0.95 


0.95 


0.95 


6 


64.3 


421.1 


0.98 


0.94 


0.95 


0.95 


0.95 


0.94 


7 


67.6 


433.3 


0.97 


0.93 


0.94 


0.94 


0.94 


0.94 


8 


74.8 


462.6 


0.97 


0.93 


0.93 


0.94 


0.94 


0.94 


9 


59.5 


401 


0.97 


0.94 


0.94 


0.94 


0.95 


0.94 


10 


56.5 


390.4 


0.97 


0.94 


0.94 


0.94 


0.94 


0.94 


11 


54.3 


382.9 


0.97 


0.93 


0.93 


0.93 


0.93 


0.93 


12 


56.4 


388.7 


0.98 


0.94 


0.95 


0.95 


0.95 


0.94 


13 


62.3 


411.8 


0.98 


0.95 


0.95 


0.95 


0.95 


0.95 


14 


64.5 


421.8 


0.98 


0.94 


0.94 


0.94 


0.94 


0.94 


15 


65.1 


426.2 


0.98 


0.95 


0.95 


0.95 


0.95 


0.95 


16 


56.2 


381.5 


0.97 


0.92 


0.93 


0.93 


0.92 


0.93 


17 


65.6 


427.2 . 


0.97 


0.94 


0.94 


0.94 


0.94 


0.94 


18 


61.1 


407 


0.97 


0.94 


0.95 


0.94 


0.94 


0.95 


19 


68.2 


437.4 


0.97 


0.94 


0.95 


0.95 


0.94 


0.94 


20 


57.3 


391 


0.98 


0.95 


0.95 


0.95 


0.95 


0.95 



Table 4 Mathematics, age 13, correlations between percent correct 
scores and plausible values 



POP 


% 


PROF 


CORR 


PI 


P2 


P3 


P4 


P5 


1 


63.8 


522.1 


0.98 


0.95 


0.95 


0.95 


0.95 


0.95 


2 


65.9 


531.1 


0.97 


0.95 


0.95 


0.95 


0.95 


0.95 


3 


62.1 


517.7 


0.97 


0.95 


0.95 


0.95 


0.95 


0.95 


4 


80.1 


582.3 


0.95 


0.91 


0.91 


0.9 


0.91 


0.92 


5 . 


60.6 


515 


0.97 


0.95 


0.95 


0.96 


0.95 


0.95 


6 


33.9 


407.4 


0.9 


0.84 


0.85 


0.85 


0.85 


0.85 


7 


64.4 


525.7 


0.98 


0.95 


0.95 


0.95 


0.96 


0.96 


8 


68.3 


539 


0.97 


0.95 


0.95 


0.95 


0.95 


0.95 


9 


60.5 


512.4 


0.97 


0.95 


0.95 


0.95 


0.95 


0.94 


10 


63.5 


523.1 


0.97 


0.95 


0.95 


0.95 


0.95 


0.95 


11 


64.7 


523.4 


0.97 


0.95 


0.95 


0.95 


0.95 


0.95 


12 


40.7 


442.3 


0.94 


0.9 


0.9 


0.91 


0.9 


0.91 


13 


73.6 


557.3 


0.96 


0.94 


0.94 


0.94 


0.94 


0.94 


-14 


57.7 


503.1 


0.97 


0.94 


0.94 


0.94 


0.94 


0.94 


15 


63.3 


521.3 


0.97 


0.94 


0.94 


0.95 


0.95 


0.94 


16 


30.8 


401.3 


0.84 


0.74 


0.74 


0.73 


0.73 


0.74 


17 


57.5 


501.6 


0.97 


0.94 


0.95 


0.94 


0.94 


0.95 



18 


60.9 


512.6 


0.97 


0.94 


0.95 


0.95 


0.95 


0.95 


19 


59 


508.7 


0.97 


0.94 


0.94 • 


0.94 


0.94 


0.94 


20 


59.9 


511.9 


0^97 


0.95 


0.95 


♦ 0.95 


0.95 " 


0.95 


21 


57.8 


505.1 


0.97 


' 0.95 


0.94 


0.£4 


0.94 


0.95 


23 


53.9 


491.7 


0.97 


0.93 


0.94 


0.93 


* o.23 




24 * 


50.2 


.478.1 


0.96 


0.93 


0.93 


0.93 


0.93 


0 


25 


65.1 


. 531.4 


0.97 


0.95 


0.95 


0.95 - 


0.95 


yj . 7«j 


26 


* 68.9 


537.8 


0.98 


0.94 


0.95 


0.95 


0.95 


0.95 


27 . 


36.7 


423.2 


0.93 * 


0.88 


0.88 


" 0.88 


0.88 


0.88 


28 


61 


517.5 


0.98 


0.95 


0.95 


0.94 


0.95 


0.95 


29 


61.1 


534.4 


0.97 


0.94 


0.94 


0.93 


0.95 


0.95 


30 


60.7 


515.8 


0.98 


0.95 


0.96 


0.95 


0.96 


0.95 


31 


57.6 


504.9 


0.98 


0.95 


0.95 


0.95 


0.95 


0.95 


32 


70.3 


544.7 


0.97 


0.95 


0.95 


0.95 


0.95 


0.95 


33 


55.7 


492.6 


0.97 


0.93 


0.93 


0.93 


0.93 


0.93 


34 


74.2 


555 


0.97 


0.94 


0.94 


0.94 


0.93 


0.94 


35 


72.7 


561.8 


0.95 


0.93 


0.93 


0.93 


0.94 


0.93 


36 


54.6 


491.4 


0.97 


0.95 


0.95 


0.94 


0.95 


0.94 



Table 5 Science, age 9, correlations between percent correct scores 
and plausible values 



POP % PROF 



1 


65.4 


436.4 


2 


62.2 


417.3 


3 


62.5 


418.8 


4 


62 


418.8 


5 


54.8 


379.5 


6 


60.9 


411.5 


7 


66.4 


440.9 


8 


68.4 


441.9 


9 


60.8 


409.3 


10 


61.6 


414.2 


11 


55.6 


381.4 


12 


54 


373.5 


13 


62.3 


419 


14 


62.4 


417.3 


15 


61 


413.2 


16 


57.9 


383.1 


17 


61.2 


414.3 


18 


61 


410.4 


19 


66.2 


437.4 


20 


63.6 


426.4 



CORR 


PI 


P2 


0.96 


0.91 


0.91 


0.96 


0.9 


0.9 


0.97 


0.91 


0.92 


0.95 


0.89 


0.9 


0.96 


0.91 


0.91 


0.96 


0.91 


0.91 


0.96 


0.91 


0.91 


0.96 


0.9 


0.9 


0.96 


0.9 


0.91 


0.96 


0.91 


0.91 


0.95 


0.88 


0.88 


0.95 


0.89 


0.88 


0.96 


0.91 


0.92 


0.96 


0.89 


0.89 


0.96 


0.9 


0.91 


0.94 


0.86 


0.86 


0.96 


0.9 


0.9 


0.96 


0.91 


0.91 


0.97 


0.92 


0.92 


0.97 


0.93 


0.93 



P3 


P4 


P5 


0.91 


0.9 


0.91 


0.91 


0.9 


0.91 


0.92 


0.91 


0.92 


0.89 


0.89 


0.89 


0.91 


0.91 


0.91 


0.91 


0.91 


0.92 


0.91 


0.91 


0.91 


0.9 


0.91 


0.91 


0.91 


0.91 


0.91 


0.91 


0.91 


0.91 


0.88 


0.88 


0.88 


0.88 


0.89 


0.89 


0.91 


0.91 


0.91 


0.9 


0.89 


0.9 


0.91 


0.91 


0.9 


0.87- 


0.86 


0.86 


0.9 


0.91 


0.9 


0.91 


0.91 


0.91 


0.92 


0.92 


0.92 


0.93 


0.93 


0.93 
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Table 6 Mathematics, age 9, correlations between percent correct 
scores and plausible values 



POP 


NB 


% 


PROF 


CORR 


PI 


P2 


P3 


P4 


P5 


1 


1460 


73.9 


539 


0.96 


0.92 


0.92 


0.92 


0.92 


0.92 


1 


1617 


72.5 


533.2 


0.96 


0.91 


0.92 


0.91 


0.92 


0.91 


3 


4980 


68.9 


517.3 


0.96 


0.92 


0.92 


0.92 


0.92 


0.92 


4 


1775 


66.7 


510 


0.97 


0.92 


0.93 


0.92 


0.93 


0.92 


5 


929 


68 


516.1 


0.97 


0.92 


0.93 


0.94 


0.94 


0.93 


6 


1505 


48.3 


407 


0.93 


0.86 


0.86 


0.86 


0.87 


0.87 


7 


1787 


68.4 


516 


0.97 


0.94 


0.93 


0.93 


0.94 


0.93 


8 


1623 


73.3 


538' 


0.96 


0.93 


0.93 


0.93 


0.93 


0.93 


9 


1657 


63.1 


492.6 


0.97 


0.93 


0.93 


0.93 


0.93 


0.93 


10 


1584 


69.4 


518.2 


0.97 


0.93 


0.93 


0.93 


0.93 


0.93 



31 



11 


1485 


70.6 


.521 


0.97 


0.92 


0.93 


0.93 " 


0.93 


0.93 


12 


1588 ' 


57.5 


454.9 


0.96 . 


"0.92 


0.92 


" 0.91 


• 0.92 


0.91 


13 


■■ 1635 


77.5 


556. " 


0i97 


0.92- 


0.92 


0.92 


0.93 


0.93 


14 


1672 


68.7- 


514.6 " 


0.97 


0.93 


0.93 


0.93 


0.93 


0.93 " 


15 


666 " 


66.5 


' 506.2 


■ 0.97 


0.91 


0.92 


. 0.92 


0.93 


0.92 


16 


1604 


66.3 


504.6 


0.97 


0.92 ■ 


0.93 


0.92 


0.92 


. 0.93 


17 


1656 


63.2 


493 


0.96 


0.92 


0.92 


0.92 


0.92. 


0.92 


18 


1566 


66.7 


' -504.7 


0.97 


0.93 


0.93 


0.93 


0.93 


0.93 


19 


1542 


69.1 


516.3 


0.97 


0.93 


0.93 


.■ 0.93 


0.93 


0.93 


20 ' 


■ 1609 


66.9 


509.2 


0.96 


. 0.92 


0^92 


"0.92 


0.92 


0.92 


21 


1434 


60.4 


479.7 


0.96 


0.91 


0.91 


0.91 


0.92 


0 91 


22 


1520 


63.7 


490 


0.97 


0.93 


0.93 


0.92 


0.92 


0.92 


23 


1416- 


69.5 


519.6 


0.97 


0.92 


0.93 


0.92 


0.92 


0.92 


24 


1579 


71.3 


528.5 


0.96 


0.92 


0.92 


0.92 


0.92 


0.92 


25 


1469 


52.5 


434.5 


0.95 


0.89 


0.9 


0.9 


0.89 


0.89 


26 


1694 


70.6 


.521.7 


0.97 


0.92 


0.93 


0.92 


0.93 


0.93 


27 


223 


65 


500.7 


0.96 


0.92 


0.9 


0.9 


0.91 


0.9 


28 


1584 


67.4 


513.6 


. 0.97 


0.93 


0.94 


0.93 


0.93 


0.93 


29 


1598 


70.4 


521.4 


0.97 


0.93 


0.92 


0.92 


0.93 


0.93 


30 


1839 


70.9 


525.1 


0.97 


0.92 


0.92 


. 0.92 


0.92 


0.92 


31 


1609 


68.2 


508.1 


0.96 


0.92 


0.92 


0.91 


0.91 


0.92 


32 


3653 


75.6 


549.5 


0.96 


0.91 


0.91 


0.91 


0.91 


0.91 


33 


1786 


75.8 


548.6 


0.97 


0.94 


0.93 


0.93 


0.94 


0.94 


34 


1404 


67 


504.1 


0.97 


0.93 


0.93 


0.93 


0.93 


0.93 



i 
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Table 7 



Mathematics, age 9 and 13, mean proficiency scores and 
percentile by population 



MEAN SE CI CIO C20 C30 .C40 MED C60 C70 .C80. C90 C 100 

381.51 1.93* 191.95 302.95 ' 329.34 351.09 368.26 386.74 399.66 413.62 431.31 453 7 519 27 

383.06 2.15* 175.97 298.51 329.01 353.24 372.77 388.7 402.94 417.25 435.31 457.95 522 98 

388.03 3.92* 170.03 291.46 323.93 350.9 372.47 ' 393.65 411.75 429.11 450.92 477.2 553 19 
390.39 2.66* 164.18 296.37 332.78 358.03 378.49 397.15 412.85 429.43 448.72 474 92 547 83 

391.26 4.38* 164.71 271.8 321.93 351.36 377.71 399.38 416.93 435.41 459.57 486.67 578 95 
399,46 3.13* 171.11 289.46 337.99 366.86 387.93 408.49 423.98 442.74 463.34 489.78 585 16 
400.42 7.8* 183.19 298.95 337.59 362.18 382.33 402.48 421.16 439.54 464.51 500.76 635 98 
400.99 2.02* 152.84 299.55 346.36 371.24 391.16 409.16 424.45 438.92 457.73 484 39 599 54 

401.04 1.63 258.91 342.78 363.85 378.64 392.02 403.56 412.59 426.41 438.26 455 65 515.54 
407.34 3.9* 162.96 300.15 343.89 372.9 395.58 413.56 431.08 450.05 469.37 497 44 595 34 

407.46 2.93 220.32 314.19 344.3 368.36 385.39 403.4 425.48 449.96 471.44 502 89 589 64 
409.65 2.66* 149.78 315.67 356.93 382.11 398.72 414.26 430.81 447.26 464.84 494 61 594 11 

411.81 3.19* 173.85 315.81 355.87 382 401.35 417.68 432.41 449.51 469.16 497.48 605 63 

421.16 2.81* 187.59 326.3 362.07 385.5 407.12 424.45 443.01 459.63 481.73 505 94 634 >8 

421.74 2.77* 192.68 336.15 373.65 395.33 413.02 427.06 441.69 454.44 471.01 494 62 570^8 

424.11 3.65 224.77 333.53 364.4 383.24 403.91 419.4 439.74 458.89 483 518.93 594 88 

426.12 3.34* 183.38 339.3 371.61 393.54 412.53 428.24 442.83 462.76 483.35 515.21 618 77 
427.54 4.43* 198.88 328.67 364.13 392.39 411.34 431.63 449.19 467.48 487.96 518 17 633 28 
433.29 3.59* 204.37 345.26 380.6 401.65 418.79 434.99 449.1 466.1 486.56 523 79 628 13 

434.38 2.63* 188.21 334.82 370.79 396.93 419.31 438.86 455.42 475.13 501.91 526 3 648 28 
437.23 3.03* 191.05 342.53 378.26 403.58 424.66 441.8 459.01 473.95 496.25 527 41 616 47 

442.39 3.51 250.4 349.94 382.51 408.77 426.25 443.87 462.35 479.27 501.23 527 64 627 97 

462.65 2.42* 193.39 373.09 407.46 433.4 452.04 466.28 481.15 499.76 518.23 546 32 684 15 

477.27 2.85 270.53 400.39 432.52 451.07 468.31 481.81 496.42 510.26 526.58 548 15 670 91 

491.62 1.97 284.05 420.75 452.84 470.07 483.75 495.44 506.22 518.88 534.03 549 89 624 32 
491.85 3.89 277.32 404.8 435.05 457.15 475.91 492.7 508.54 526.45 545 82 572 57 685 99 
492.78 2.5 292.07 426.38 450.49 466.75 480.74 494.36 508.38 521.49 535.65 555 64 636 41 
501.61 1.58 294.14 428.66 458.93 477.05 490.82 504.02 516.2 530.37 547.28 570 69 660 67 

503.17 2.53 269.1 428.57 458.99 478.35 492.39 507.16 521.15 534.26 550.17 572 58 649 6 

504.75 2.33 293.34 425.05 454.79 474.83 493.06 508.66 522.82 536.37 551.51 575 69 650 8 
505.41 2.69 305.02 436.54 459.63 475.85 491.7 505.45 518.06 533.75 550.18 575 07 665 12 
508.60 2.02 257.4 436.49 465.37 485.11 500.75 512.49 524.86 537.97 553.37 573 86 652 7 
511.72 1.67 2 83.82 441.2 1 467.56 484.24 500.83 513.54 525.3 2 539.79 556.74 580 3 2 678 45 
512.36 2.79 237.55 430.4 460.45 483.36 501.75 518.54 531.16 547.12 564.78 586 85 694 78 
512.60 1.24 283.65 435.42 468.04 488.96 503.51 518.04 531.36 544.17 559 84 579 22 647 03 
514.67 7.03 281.3 428.78 466.16 484.02 499.37 515.81 531.41 548.28 570.28 599 71 708 92 
515.51 2.79 305.98 437.54 466.21 487.24 503.38 517.69 532.91 549.25 565.47 587 34 672 02 
517.44 2.3 260.66 446.55 475.24 494.08 507.92 520.07 532.79 545.74 561.52 583 45 672 68 
521.27 2.03 296.6 461.58 485.39 499.89 511.7 522.03 533.6 544.67 559.6 581 12 640 71 
522.03 2.31 288.29 450.34 476.1.3 495.41 509.69 523.07 536.03 550.5 567.34 590 99 68549 
522.99 2.48 303.41 447.82 477.19 499.03 513.81 527.55 541.21 555.61 569.85 591 07 688 79 
523.11 2.67 288.95 443.71 471.78 495.36 513.51 529.59 543.2 556.84 571.71 592 7 671 52 

525.66 2.5 288.39 445.59 475.44 496.44 513.3 529.01 544.99 558.96 576.56 598 29 675 2 
530.96 3.34 277.82 459.16 486.32 504.37 518.64 530.33 543.5 557.22 575.16 602 92 685 15 
531.15 2.17 311.03 461.64 487.58 503.2 516.43 529.79 542.08 557.72 574.02 601.26 680 61 
534.41 3.32 369.15 471.85 496.64 510.36 521.7-3 531.31 545.97 557.95 568.25 582 47 623 23 
537.72 2.28 332.12 474.66 499.51 515.5 527.35 538.74 550.82 562.16 577.75 598 14 682 6 
538.88 2.68 296.35 451.82 485.63 507.75 527.54 542.65 557.38 572 17 593 72 619 7 725 04 

544.63 2.94 299.35 467.72 496.51 518.11 536.2 548.69 562.98 577.16 592 08 613 33 705 32 
552.78 3.73 339.55 487.93 514.17 531 545.01 556.64 569.29 582.76 596.66 616 93 712 96 
557.19 2.58 265.81 458.05 497.84 523.48 542.78 561.19 579.1 596.79 617 95 646 746 78 

561.82 2.79 229.5 437.83 481.47 510.77 540.16 567.57 594.4 617.13 641 98 678 49 815'l9 

582.47 4.18 367.86 510.76 536.91 553.67 566.83 580.7 595.57 610.22 627 42 653 95 747 57 
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Table 8 Science, age 9 and 13, mean proficiency scores and 
percentile by population 



MEAN SE 



374.41 

382.05 

383.63 

385.25 

412.25 

415.57 

417.15 

418.08 

420.43 

421.35 

421.89 

424.57 

425.91 

426.43 

426.62 

435.63 

445.52 

446.32 

447.77- 

451.82 

453.04 

467.93 

496.29 

507.53 

511.31 

511.54 

520.45 

524.8 

524.87 

524.91 

526.74 

529.33 

530.36 

531.12 

535.17 

536.55 

537.99 

538.13 

538.22 

540.66 

542.13 

543.85 

544.16 

544.59 

548.17 

552.16 

557.58 

563.13 

564.36 

573.87 

575.49 

583.82 



4.46* 

4.07* 

2.63* 

2.56* 

4.34 

2.32* 

4.25* 

3.74* 

3.62* 

3.25* 

6.05* 

3* 

5.*69 

2.8*7 

3.97* 

5.45* 

3.4 

3.5* 

3.21* 

4.96* 

2.79* 

3.89 

2.64 

4.04 

3.01 

1.73 

4 

5.21 

2.31 

1.77 

3.37 

2.77 

3.32 

5.6 

3.36 

3.15 

2.99 

2.27 

6 

3.37 

2.58 

2.95 

2.57 

2.65 

4.2 

2.57 

2.26 

2.94 

2.14 

4.26 

2.23 

2.72 



.CI 

140.22 

119.03 

130.02 

169.31 

169.13 

134.5 

156.16 

132.6 

160.61 

154.75 

161.45 

149.72 

152 

156.35 

163.91 

141.93 

203.01 

179.23 

160.14 

175.33 

207.95 

164.65 

274.93 

227.89 

227.36 

232.71 

292.96 

182.23 

268.86 

240.86 

296.63 

247.75 

275.43 

248.69 

252.46 

260.02 

289.44 

300.91 

226.85 

274.45 

282.22 

276.07 

254.09 

264.51 

274.19 

301.09 

288.7 

297.35 

242.1 

309.41 

251.2 

266.76 



CIO C20 



269.09 

249.51 

285.61 

291.91 

301.78 

283.85 

302.1 

305.04 

308.9 

302.13 

328.85 

331.31 

295.71 

333.34 

308.2 

295.25 

326.05 

343.37 

331.5 

346.21 

360.33 

351.34 

407.55 

402.27 

402.75 

415.63 

451.86 

423.9 

429.56 

430.43 

439.23 

444.3 

443.49 

426.94 

430.19 

434.92 

430.79 

443.63 

430.95 

439.41 

455.54 

451.33 

453.16 

457.77 

457.57 

470.3 

473.88 

461.29 

479.9 

489.19 

455.41 

487.23 



309.39 

301.91 

322.64 

327.2 

337.98 

348.39 

344.6 

343.88 

356.01 

351.81 

360.49 

362.64 

348.07 

369.66 

354.54 

354.71 

367.69 

384.75 

375.91 

383.84 

395.49 

394.72 

438.43 

444.48 

446.06 

453.62 

470.2 

466.56 

466.29 

466.4 

470.2 

473.12 

478.33 

466.17 

472.87 

472.69 

474.59 

475.63 

472.43 

475.47 

486.13 

484.28 

483.05 

489.06 

490.27 

498.57 

507.4 

496.45 

509.88 

522.52 

500.73 

524.28 



C30 

332.55 

338.6 

348.96 

353.9 

365.65 

378.19 

373.33 

375.01 

379.73 

381.26 

382.74 

389.48 

384.03 

394.?5 

386.53 

393.63 

396.46 

414.54 

405.79 

405.5 

416.28 

427.97 

461.01 

469.11 

472.41 

475.82 

487.14 

489.91 

489.56 

491.41 

492.22 

495.02 

499.91 

493.33 

497.86 

504.21 

500.24 

501.74 

495.8 

502.35 

508.33 

508.7 

506.37 

510.36 

514.61 

517.45 

528.55 

520.99 

531.42 

543.83 

537.77 

550.35 



C40 

354.34 

364.42 

366.65 

370.6 

387.62 

407.24 

398.52 

396.09 

405.72 

407.34 

400.44 

409.28 

410.09 

412.85 

409 

421.04 

422.52 

433.93 

427.26 

429.99 

434.59 

454.39 

477.17 

490.64 

494.09 

494.96 

500.62 

509.14 

508.64 

511.46 

510.01 

513.01 

516 

514.52 

520.6 

522.72 

520.98 

522.25 

519.61 

523.83 

525.46 

527.63 

525.41 

529.63 

535.66 

535.25 

543.89 

540.99 

549.61 

560.47 

563.82 

571.04 



MED 

374.84 

389.76 

381.73 

387.54 

410.12 

428.44 

423.03 

416.61 

423.13 

431.77 

419.75 

429.94 

433.88 

429.47 

431.82 

444.18 

444.8 

454.16 

449.58 

452.2 

451.87 

475.34 

494.93 

510.4 

512.67 

515.78 

514.82 

527.06 

525.39 

529.24 

528.05 

528.58 

530.39 

533.35 

538.27 

539.49 

540.1 

540,03 

541.22 

542.3 

542.32 

544.74 

546.22 

546.95 

553.57 

550.76 

561.11 

564.78 

566.29 

578.03 

585.28 

589.38 



C60 

394.09 

413.1 

400.1 

404.5 

432.34 

446.81 

444.48 

440.74 

445.36 

450.7 

437.82 

446.53 

454.51 

447.85 

453.55 

468.5 

467.79 

471.67 

470.95 

474.51 

470.23 

495.74 

514.48 

528.69 

532.69 

532.16 

530.5 

544.06 

544.35 

546.89 

544.28 

545.25 

544.92 

551.21 

558.23 

558.41 

558.85 

558.68 

559.49 

562.99 

558.08 

563 

562.02 
561.33 
570.09 
566.55 
575.16 
586.57 
583.61 
593.31 
606.42 
605.71 



C70 

415.79 

436.63 

423.11 

424.69 

458.19 

463.66 

466.87 

464.69 

465.64 

469.71 

456.49 

465.67 

479.24 

464.79 

472.8 

489.75 

486.96 

488.2 

496.56 

495.44 

490.14 

517.2 

534.28 

551.62 

555.79 

549.74 

545.79 

561.81 

562.52 

564.25 

560.67 

562.67 

560.48 

569.98 

575.81 

576.14 

580.02 

578.4 

579.85 

583.1 

575 

582.1 

581.34 

580.4 

588.6 

584.71 

590.22 

606.5 

599.31 

609.63 

628.49 

622.55 



C80 

441.05 

462.23 

444.99 

447.07 

487.5 

486.12 

491.4 

492.68 

489.69 

492.94 

483.08 

485.73 

502.48 

486.07 

498.27 

517.14 

513.79 

508.12 

520.78 

519.71 

513.77 

538.97 

556.81 

577.12 

578.95 

570.49 

566.94 

584.13 

583.41 

585.48 

581.37 

582.22 

581.71 

595.36 

599,29 

599.22 

602.23 

599.5 

603.79 

601.99 

597.36 

603.4 

605.04 

602.53 

605.15 

608.98 

607.93 

628.55 

619.45 

632.06 

650.36 

645.11 



C90 

474.02 

493.67 

475.82 

473.1 

522.02 

516.15 

525.96 

526.54 

517.03 

523.77 

517.31 

513.73 

532.5 

511.89 

533.26 

550.88 

550.91 

535.62 

561.39 

552.97 

546.85 

566.39 

584.4 

606.24 

610.72 

600.81 

597.45 

617.59 

618.81 

613.99 

610.49 

613.05 

611.21 

627.5 

631.11 

631.57 

630.95 

628.81 

634.08 

634.52 

630.15 

632.1 

633.9 

630.69 

632.38 

633.77 

633.74 

664.21 

646.86 

663.02 

682.75 

675.76 



C 100 

572.64 

605.63 

587.61 

568.35 

675.29 

616.07 

643.69 

625.6 

605.66 

630.09 

635.89 

640.56 

641.36 

598.19 

625.27 

683.02 

668.8 

633.81 

664.22 

651.2 

645 

664.19 

685.65 

759.49 

723.8 

709.82 

663.03 

727.17 

706.96 

736.33 

682.67 

712.85 

712.81 

747.72 

749.64 

762.48 

735.12 

735.93 

719.43 

745.32 

735.36 

729.98 

728.35 

730.26 

700.41 

724.21 

730.51 

779.82 

816.52 

785.75 

786.11 

771.89 
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Table 9 Mathematics, equated slope parameters for common items 



9-year-olds 13-year-olds 



Iteml7 


. 0.76897 


0.73187 


Item20 


Item 18 


0.70382 


0.83575 


Item21 


Item 19 


0.77018 


0.66709 


Item22 


Item20 


0.86174 


0.59672 


Item23 


Item21 


0.71834 


0.56451 


Item24 


Item22 


0.90387 


0.67999 


' Item25 


Item23 


0.87985 


1.03708 


Item26 


Item24 


1.1638 


0.99587 


Item27 


Item25 


0.79035 


0.82189 


Item28 


Item26 


0.95702 


0.99254 


Item29 


Item27 


0.88049 


1.11276 


Item30 


Item28 


1.15055 


1.24063 


Item31 


Item29 


0.95931 


0.87039 


Item32 


Item30 


0.95709 


1.07588 


Item33 



Table 10 Mathematics, equated treshold parameters for common items 

9-year-olds 13-year-olds 



Iteml7 -1.36162 -1.44386 Item20 

ItemlS -1.81124 -1.66592 Item21 

Iteml9 -0.47919 -0.14352 Item22 

Item20 -0.24729 -0.41038 Item23 

Item21 -0.76059 -0.56463 Item24 

Item22 -1.33062 -1.45374 Item25 

Item23 -0.9G547 -0.58454 Item26 

Item24 -0.16388 -0.38028 Item27 

Item25 -0.45516 -0.33349 Item28 

Item26 -0.84136 -0.74667 Item29 

Ite m 27 0.43431 -0.62607 Item30 

Item28 -0.05627 0.23263 Item31 

Item29 -0.4826 -0.60479 Item32 

Item30 -0.23453 -0.53731 Item33 



Table 11 Mathematics, equated guessing parameters for common items 

9-year-olds 13-year-olds 



Iteml7 0.23484 0.20564 Item20 

Ite ml8 0.25686 0.18289 Item21 

Ite ml 9 0.27117 0.3104 Item22 

Ite m 20 0.28042 0. 18696 Item23 

Ite m21 0.27469 0.20861 Item24 

Item22 0.16562 0.18137 Item25 

Item23 0.19152 0.30164 Item26 

Item2* 0.30381 0.17997 Item27 

Item25 0.17341 0.17542 Item28 

Item26 0 0 Item29 

Item27 0.32057 0.37623 Item30 

Item28 0.27199 0.2069 Item3l 

Ite m29 0.1609 0.24596 Item32 

Item30 0.23691 0.24811 Item33 
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Table 12 



Science, equated slope parameters for common items 





9-year-olds 


. 13-year-olds 




Item 16 


• 0.16192 * 


0.3003 • 


Item 16 


Item 17 


0.68524 


• 0.74228 


•Iteml7 


Item 18 


0.75434 


0.76781 


' Iteml8 


Iteml9 


0.72529 


0.66119 


Iteml9 


Item20 


0.55497 


1.03132 


Item20 


Item21 


0.80861 


0.71951 


Item21 


Item22 


0.53808 


0.74359 


Item22 


Item23 


0.60036 


0.6129 


Item23 


Item24 


0.48032 


0.64799 


Item24 


Item25 


0.58275 


0.95714 


Item25 


Item26 


. 0.76455 


0.37457 


Item26 


Item27 


0.29778 


0.7748 


Item27 


Item28 


0.74936 


0.86323 


Item28 


Item29 


0.94014 


1.32178 


Item29 



Table 13 Science, equated treshold parameters for common items 





9-year-olds 


13-year-olds 




Item 16 


-2.64404 


-1.36454 


Iteml6 


Item 17 


-0.79376 


-0.81066 


Item 17 


Iteml8 


-1.1249 


-1.00337 


Iteml8 


Iteml9 


-1.29098 


-1.4272 


Iteml9 


Item20 


-1.05326 


-0.21995 


Item20 


Item21 


-0.13286 


-0.38295 


Item21 


Item22 


-1.35272 


-1.19196 


Item22 


Item23 


-1.48899 


-1.17802 


Item23 


Item24 


-2.36079 


-2.00635 


Item24 


Item25 


-0.61151 


-0.47411 


Item25 


Item26 


-1.00494 


0.17538 


Item26 


Item27 


-0.78997 


-0.14441 


Item27 


Item28 


-0,63007 


-0.47186 


Item28 


Item29 


-0.25141 


-0.5232 


Item29 



Table 14 Science, equated guessing parameters for common items 





9-year-olds 


13-year-olds 




Iteml6 


0.29306 


0.34093 


Iteml6 


Iteml7 


0.29953 


0.34522 


Iteml7 


Item 18 


0.30201 


0.26712 


Item] 8 


Item 19 


0.21697 


0.26014 


Iteml9 


Item20 


0.36796 


0.48258 


Item20 


Item21 


0.3591 


0.37624 


Item21 


Item22 


0.22984 


0.31043 


Item22 


Item23 


0.17459 


0.23425 


Item23 


Item24 


0.31988 


0.27574 


Item24 


Item25 


0.27565 


0.40488 


Item25 


Item26 


0.18804 


0.30189 


Item26 


Item27 


0.27049 


0.41344 


Item27 


Item28 


0.22244 


0.255 


Item28 


Item29 


0.27053 


0.24219 


Item29 
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Figure 1 Reference population item parameter estimates compared to 

population x item parameter estimates 
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Reference population item parameter estimates compared to 
population y item parameter estimates 
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Figure .3 



Reference population item parameter estimates compared to 
population z item parameter estimates 
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Science, age 13, items I and 2, superimposed ICCs estimated 
from each individual population (solid lines) and the reference 
population (dashed line) 
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Figure 6 Science, age 13, items 5 and 6, superimposed ICCs estimated 

from each individual population (solid lines) and the reference 
population (dashed line) ' 
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Figure Jl Mathematics, ages 9 . and 13, superimposed regression lines 
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Figure . 9 Mathematics, ages 9 and 13, superimposed regression lines for 
equated population item parameter estimates 
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Figure 10 



Science, ages 9 and 13, superimposed regression lines 
for equated population item parameter estimates 
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Figure H Mathematics, equated slope parameters for the common items. 
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Figure 72 ... Mathematics, equated threshold parameters for the common 
items. 
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Figure |3 



Mathematics, equated guessing parameters for the common 
items. 
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Figure J4 Science, equated slope parameters for the common items. 
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Figure /5 Science, equated threshold parameters for the common items. 
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Figure 16 Science, equated guessing parameters for the common items. 
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